Keyword Search Result

[Keyword] microphone array(57hit)

21-40hit(57hit)

  • A Robust Sound Source Localization Approach for Microphone Array with Model Errors

    Hua XIAO  Huai-Zong SHAO  Qi-Cong PENG  

     
    PAPER-Speech and Hearing

      Vol:
    E91-A No:8
      Page(s):
    2062-2067

    In this paper, a robust sound source localization approach is proposed. The approach retains good performance even when model errors exist. Compared with previous work in this field, the contributions of this paper are as follows. First, an improved broad-band and near-field array model is proposed. It takes array gain, phase perturbations into account and is based on the actual positions of the elements. It can be used in arbitrary planar geometry arrays. Second, a subspace model errors estimation algorithm and a Weighted 2-Dimension Multiple Signal Classification (W2D-MUSIC) algorithm are proposed. The subspace model errors estimation algorithm estimates unknown parameters of the array model, i.e., gain, phase perturbations, and positions of the elements, with high accuracy. The performance of this algorithm is improved with the increasing of SNR or number of snapshots. The W2D-MUSIC algorithm based on the improved array model is implemented to locate sound sources. These two algorithms compose the robust sound source approach. The more accurate steering vectors can be provided for further processing such as adaptive beamforming algorithm. Numerical examples confirm effectiveness of this proposed approach.

  • Enhancement of Sound Sources Located within a Particular Area Using a Pair of Small Microphone Arrays

    Yusuke HIOKA  Kazunori KOBAYASHI  Ken'ichi FURUYA  Akitoshi KATAOKA  

     
    PAPER-Engineering Acoustics

      Vol:
    E91-A No:2
      Page(s):
    561-574

    A method for extracting a sound signal from a particular area that is surrounded by multiple ambient noise sources is proposed. This method performs several fixed beamformings on a pair of small microphone arrays separated from each other to estimate the signal and noise power spectra. Noise suppression is achieved by applying spectrum emphasis to the output of fixed beamforming in the frequency domain, which is derived from the estimated power spectra. In experiments performed in a room with reverberation, this method succeeded in suppressing the ambient noise, giving an SNR improvement of more than 10 dB, which is better than the performance of the conventional fixed and adaptive beamforming methods using a large-aperture microphone array. We also confirmed that this method keeps its performance even if the noise source location changes continuously or abruptly.

  • An Approach to Solve Local Minimum Problem in Sound Source and Microphone Localization

    Kazunori KOBAYASHI  Ken'ichi FURUYA  Yoichi HANEDA  Akitoshi KATAOKA  

     
    PAPER-Engineering Acoustics

      Vol:
    E90-A No:12
      Page(s):
    2826-2834

    We previously proposed a method of sound source and microphone localization. The method estimates the locations of sound sources and microphones from only time differences of arrival between signals picked up by microphones even if all their locations are unknown. However, there is a problem that some estimation results converge to local minimum solutions because this method estimates locations iteratively and the error function has multiple minima. In this paper, we present a new iterative method to solve the local minimum problem. This method achieves accurate estimation by selecting effective initial locations from many random initial locations. The computer simulation and experimental results demonstrate that the presented method eliminates most local minimum solutions. Furthermore, the computational complexity of the presented method is similar to that of the previous method.

  • Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

    Yuki DENDA  Takanobu NISHIURA  Yoichi YAMASHITA  

     
    PAPER-Speech Enhancement

      Vol:
    E89-D No:3
      Page(s):
    1050-1057

    This paper describes a new talker direction estimation method for front-end processing to capture distant-talking speech by using a microphone array. The proposed method consists of two algorithms: One is a TDOA (Time Delay Of Arrival) estimation algorithm based on a weighted CSP (Cross-power Spectrum Phase) analysis with an average speech spectrum and CSP coefficient subtraction. The other is a talker direction estimation algorithm based on ML (Maximum Likelihood) estimation in a time sequence of the estimated TDOAs. To evaluate the effectiveness of the proposed method, talker direction estimation experiments were carried out in an actual office room. The results confirmed that the talker direction estimation performance of the proposed method is superior to that of the conventional methods in both diffused- and directional-noise environments.

  • Robust Beamforming of Microphone Array Using H Adaptive Filtering Technique

    Jwu-Sheng HU  Wei-Han LIU  Chieh-Cheng CHENG  

     
    PAPER-Speech/Audio Processing

      Vol:
    E89-A No:3
      Page(s):
    708-715

    In ASR (Automatic Speech Recognition) applications, one of the most important issues in the real-time beamforming of microphone arrays is the inability to capture the whole acoustic dynamics via a finite-length of data and a finite number of array elements. For example, the reflected source signal impinging from the side-lobe direction presents a coherent interference, and the non-minimal phase channel dynamics may require an infinite amount of data in order to achieve perfect equalization (or inversion). All these factors appear as uncertainties or un-modeled dynamics in the receiving signals. Traditional adaptive algorithms such as NLMS that do not consider these errors will result in performance deterioration. In this paper, a time domain beamformer using H∞ filtering approach is proposed to adjust the beamforming parameters. Furthermore, this work also proposes a frequency domain approach called SPFDBB (Soft Penalty Frequency Domain Block Beamformer) using H∞ filtering approach that can reduce computational efforts and provide a purified data to the ASR application. Experimental results show that the adaptive H∞ filtering method is robust to the modeling errors and suppresses much more noise interference than that in the NLMS based method. Consequently, the correct rate of ASR is also enhanced.

  • Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition

    Jwu-Sheng HU  Chieh-Cheng CHENG  

     
    PAPER-Noise and Vibration

      Vol:
    E88-A No:9
      Page(s):
    2401-2411

    This investigation proposed two array beamformers SPFDBB (Soft Penalty Frequency Domain Block Beamformer) and FDABB (Frequency Domain Adjustable Block Beamformer). Compared with the conventional beamformers, these frequency-domain methods can significantly reduce the computation power requirement in ASR (Automatic Speech Recognition) based applications. Like other reference signal based techniques, SPFDBB and FDABB minimize microphone's mismatch, desired signal cancellation caused by reflection effects and resolution due to the array's position. Additionally, these proposed methods are suitable for both near-field and far-field environments. Generally, the convolution relation between channel and speech source in time domain cannot be modeled accurately as a multiplication in the frequency domain with a finite window size, especially in ASR applications. SPFDBB and FDABB can approximate this multiplication by treating several frames as a block to achieve a better beamforming result. Moreover, FDABB adjusts the number of frames on-line to cope with the variation of characteristics in both speech and interference signals. A better performance was found to be achievable by combining these methods with an ASR mechanism.

  • Near-Field Sound-Source Localization Based on a Signed Binary Code

    Miki SATO  Akihiko SUGIYAMA  Osamu HOSHUYAMA  Nobuyuki YAMASHITA  Yoshihiro FUJITA  

     
    PAPER-Digital Signal Processing

      Vol:
    E88-A No:8
      Page(s):
    2078-2086

    This paper proposes near-field sound-source localization based on crosscorrelation of a signed binary code. The signed binary code eliminates multibit signal processing for simpler implementation. Explicit formulae with near-field assumption are derived for a two microphone scenario and extended to a three microphone case with front-rear discrimination. Adaptive threshold for enabling and disabling source localization is developed for robustness in noisy environment. The proposed sound-source localization algorithm is implemented on a fixed-point DSP. Evaluation results in a robot scenario demonstrate that near-field assumption and front-rear discrimination provides almost 40% improvement in DOA estimation. A correct detection rate of 85% is obtained by a robot in a home environment.

  • Multiple Signal Classification by Aggregated Microphones

    Mitsuharu MATSUMOTO  Shuji HASHIMOTO  

     
    PAPER-Microphone Array

      Vol:
    E88-A No:7
      Page(s):
    1701-1707

    This paper introduces the multiple signal classification (MUSIC) method that utilizes the transfer characteristics of microphones located at the same place, namely aggregated microphones. The conventional microphone array realizes a sound localization system according to the differences in the arrival time, phase shift, and the level of the sound wave among each microphone. Therefore, it is difficult to miniaturize the microphone array. The objective of our research is to build a reliable miniaturized sound localization system using aggregated microphones. In this paper, we describe a sound system with N microphones. We then show that the microphone array system and the proposed aggregated microphone system can be described in the same framework. We apply the multiple signal classification to the method that utilizes the transfer characteristics of the microphones placed at a same location and compare the proposed method with the microphone array. In the proposed method, all microphones are placed at the same place. Hence, it is easy to miniaturize the system. This feature is considered to be useful for practical applications. The experimental results obtained in an ordinary room are shown to verify the validity of the measurement.

  • Robust Subspace Analysis and Its Application in Microphone Array for Speech Enhancement

    Zhu Liang YU  Meng Hwa ER  

     
    PAPER-Microphone Array

      Vol:
    E88-A No:7
      Page(s):
    1708-1715

    A robust microphone array for speech enhancement and noise suppression is studied in this paper. To overcome target signal cancellation problem of conventional beamformer caused by array imperfections or reverberation effects of acoustic enclosure, the proposed microphone array adopts an arbitrary model of channel transfer function (TF) relating microphone and speech source. Since the estimation of channel TF itself is often intractable, herein, transfer function ratio (TFR) is estimated instead and used to form a suboptimal beamformer. A robust TFR estimation method is proposed based on signal subspace analysis technique against stationary or slowly varying noise. Experiments using simulated signal and actual signal recorded in a real room illustrate that the proposed method has high performance in adverse environment.

  • Blind Source Separation of Convolutive Mixtures of Speech in Frequency Domain

    Shoji MAKINO  Hiroshi SAWADA  Ryo MUKAI  Shoko ARAKI  

     
    INVITED PAPER

      Vol:
    E88-A No:7
      Page(s):
    1640-1655

    This paper overviews a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech. Frequency-domain BSS performs independent component analysis (ICA) in each frequency bin, and this is more efficient than time-domain BSS. We describe a sophisticated total solution for frequency-domain BSS, including permutation, scaling, circularity, and complex activation function solutions. Experimental results of 22, 33, 44, 68, and 22 (moving sources), (#sources#microphones) in a room are promising.

  • Interface for Barge-in Free Spoken Dialogue System Combining Adaptive Sound Field Control and Microphone Array

    Tatsunori ASAI  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    LETTER-Speech and Hearing

      Vol:
    E88-A No:6
      Page(s):
    1613-1618

    This paper describes a new interface for a barge-in free spoken dialogue system combining an adaptive sound field control and a microphone array. In order to actualize robustness against the change of transfer functions due to the various interferences, the barge-in free spoken dialogue system which uses sound field control and a microphone array has been proposed by one of the authors. However, this method cannot follow the change of transfer functions because the method consists of fixed filters. To solve the problem, we introduce a new adaptive sound field control that follows the change of transfer functions.

  • Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller

    Yang-Won JUNG  Hong-Goo KANG  Chungyong LEE  Dae-Hee YOUN  Changkyu CHOI  Jaywoo KIM  

     
    PAPER-Digital Signal Processing

      Vol:
    E88-A No:4
      Page(s):
    972-977

    In this paper, an adaptive microphone array system with a two-stage adaptation mode controller (AMC) is proposed for high-quality speech acquisition in real environments. The proposed system includes an adaptive array algorithm, a time-delay estimator and a newly proposed AMC. To ensure proper adaptation of the adaptive array algorithm, the proposed AMC uses not only temporal information, but also spatial information. The proposed AMC is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive array algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed algorithm is implemented as a real-time man-machine interface module of a home-agent robot. Simulation results show 13 dB SINR improvement with the speaker sitting 2 m distance from the home-agent robot. The speech recognition rate is also enhanced by 32% when compared to the single channel acquisition system.

  • Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

    Weifeng LI  Tetsuya SHINDE  Hiroshi FUJIMURA  Chiyomi MIYAJIMA  Takanori NISHINO  Katunobu ITOU  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    384-390

    This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.

  • Multistage SIMO-Model-Based Blind Source Separation Combining Frequency-Domain ICA and Time-Domain ICA

    Satoshi UKAI  Tomoya TAKATANI  Hiroshi SARUWATARI  Kiyohiro SHIKANO  Ryo MUKAI  Hiroshi SAWADA  

     
    PAPER

      Vol:
    E88-A No:3
      Page(s):
    642-650

    In this paper, single-input multiple-output (SIMO)-model-based blind source separation (BSS) is addressed, where unknown mixed source signals are detected at microphones, and can be separated, not into monaural source signals but into SIMO-model-based signals from independent sources as they are at the microphones. This technique is highly applicable to high-fidelity signal processing such as binaural signal processing. First, we provide an experimental comparison between two kinds of SIMO-model-based BSS methods, namely, conventional frequency-domain ICA with projection-back processing (FDICA-PB), and SIMO-ICA which was recently proposed by the authors. Secondly, we propose a new combination technique of the FDICA-PB and SIMO-ICA, which can achieve a higher separation performance than the two methods. The experimental results reveal that the accuracy of the separated SIMO signals in the simple SIMO-ICA is inferior to that of the signals obtained by FDICA-PB under low-quality initial value conditions, but the proposed combination technique can outperform both simple FDICA-PB and SIMO-ICA.

  • Tracking of Speaker Direction by Integrated Use of Microphone Pairs in Equilateral-Triangle

    Yusuke HIOKA  Nozomu HAMADA  

     
    PAPER

      Vol:
    E88-A No:3
      Page(s):
    633-641

    In this report, we propose a tracking algorithm of speaker direction using microphones located at vertices of an equilateral triangle. The method realizes tracking by minimizing a performance index that consists of the cross spectra at three different microphone pairs in the triangular array. We adopt the steepest descent method to minimize it, and for guaranteeing global convergence to the correct direction with high accuracy, we alter the performance index during the adaptation depending on the convergence state. Through some computer simulation and experiments in a real acoustic environment, we show the effectiveness of the proposed method.

  • Iterative Estimation and Compensation of Signal Direction for Moving Sound Source by Mobile Microphone Array

    Toshiharu HORIUCHI  Mitsunori MIZUMACHI  Satoshi NAKAMURA  

     
    PAPER-Engineering Acoustics

      Vol:
    E87-A No:11
      Page(s):
    2950-2956

    This paper proposes a simple method for estimation and compensation of signal direction, to deal with relative change of sound source location caused by the movements of a microphone array and a sound source. This method introduces a delay filter that has shifted and sampled sinc functions. This paper presents a concept for the joint optimization of arrival time differences and of the coordinate system of a mobile microphone array. We use the LMS algorithm to derive this method by maintaining a certain relationship between the directions of the microphone array and the sound source directions. This method directly estimates the relative directions of the microphone array to the sound source directions by minimizing the relative differences of arrival time among the observed signals, not by estimating the time difference of arrival (TDOA) between two observed signals. This method also compensates the time delay of the observed signals simultaneously, and it has a feature to maintain that the output signals are in phase. Simulation results support effectiveness of the method.

  • High-Fidelity Blind Separation of Acoustic Signals Using SIMO-Model-Based Independent Component Analysis

    Tomoya TAKATANI  Tsuyoki NISHIKAWA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Engineering Acoustics

      Vol:
    E87-A No:8
      Page(s):
    2063-2072

    We newly propose a novel blind separation framework for Single-Input Multiple-Output (SIMO)-model-based acoustic signals using an extended ICA algorithm, SIMO-ICA. The SIMO-ICA consists of multiple ICAs and a fidelity controller, and each ICA runs in parallel under the fidelity control of the entire separation system. The SIMO-ICA can separate the mixed signals, not into monaural source signals but into SIMO-model-based signals from independent sources as they are at the microphones. Thus, the separated signals of SIMO-ICA can maintain the spatial qualities of each sound source. In order to evaluate its effectiveness, separation experiments are carried out under both nonreverberant and reverberant conditions. The experimental results reveal that the signal separation performance of the proposed SIMO-ICA is the same as that of the conventional ICA-based method, and that the spatial quality of the separated sound in SIMO-ICA is remarkably superior to that of the conventional method, particularly for the fidelity of the sound reproduction.

  • Overdetermined Blind Separation for Real Convolutive Mixtures of Speech Based on Multistage ICA Using Subarray Processing

    Tsuyoki NISHIKAWA  Hiroshi ABE  Hiroshi SARUWATARI  Kiyohiro SHIKANO  Atsunobu KAMINUMA  

     
    PAPER-Speech/Acoustic Signal Processing

      Vol:
    E87-A No:8
      Page(s):
    1924-1932

    We propose a new algorithm for overdetermined blind source separation (BSS) based on multistage independent component analysis (MSICA). To improve the separation performance, we have proposed MSICA in which frequency-domain ICA and time-domain ICA are cascaded. In the original MSICA, the specific mixing model, where the number of microphones is equal to that of sources, was assumed. However, additional microphones are required to achieve an improved separation performance under reverberant environments. This leads to alternative problems, e.g., a complication of the permutation problem. In order to solve them, we propose a new extended MSICA using subarray processing, where the number of microphones and that of sources are set to be the same in every subarray. The experimental results obtained under the real environment reveal that the separation performance of the proposed MSICA is improved as the number of microphones is increased.

  • Microphone Array with Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator for Speech Enhancement

    Hongseok KWON  Jongmok SON  Keunsung BAE  

     
    LETTER

      Vol:
    E87-A No:6
      Page(s):
    1491-1494

    This paper describes a new speech enhancement system that employs a microphone array with post-processing based on minimum mean-square error short-time spectral amplitude (MMSE-STSA) estimator. To get more accurate MMSE-STSA estimator in a microphone array, modification and refinement procedure are carried out from each microphone output. Performance of the proposed system is compared with that of other methods using a microphone array. Noise removal experiments for white and pink noises demonstrate the superiority of the proposed speech enhancement system to others with a microphone array in average output SNRs and cepstral distance measures.

  • Sound Source Localization Using a Profile Fitting Method with Sound Reflectors

    Osamu ICHIKAWA  Tetsuya TAKIGUCHI  Masafumi NISHIMURA  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1138-1145

    In a two-microphone approach, interchannel differences in time (ICTD) and interchannel differences in sound level (ICLD) have generally been used for sound source localization. But those cues are not effective for vertical localization in the median plane (direct front). For that purpose, spectral cues based on features of head-related transfer functions (HRTF) have been investigated, but they are not robust enough against signal variations and environmental noise. In this paper, we use a "profile" as a cue while using a combination of reflectors specially designed for vertical localization. The observed sound is converted into a profile containing information about reflections as well as ICTD and ICLD data. The observed profile is decomposed into signal and noise by using template profiles associated with sound source locations. The template minimizing the residual of the decomposition gives the estimated sound source location. Experiments show this method can correctly provide a rough estimate of the vertical location even in a noisy environment.

21-40hit(57hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.